-
Notifications
You must be signed in to change notification settings - Fork 28
Tune async insert buffer #284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughDefault values for several ClickHouse-related CLI flags were switched to true, enabling async inserts and compression by default. When async inserts are enabled, additional ClickHouse settings (busy timeout, max data size, max query number) are applied during storage configuration. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant CLI as CLI Flags
participant Config as Config Loader
participant Orchestrator as Storage Orchestrator
participant CH as ClickHouse Client
CLI->>Config: Load defaults (asyncInsert=true, compression=true)
Config->>Orchestrator: Provide storage config
alt AsyncInsert enabled
Orchestrator->>CH: Apply async_insert_* settings<br/>(timeout=3000ms, max_data=100MB, max_queries=100k)
Orchestrator->>CH: Insert (async)
note right of CH: Server queues async inserts
else AsyncInsert disabled
Orchestrator->>CH: Insert (sync)
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes ✨ Finishing Touches
🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
internal/storage/clickhouse.go (1)
130-133: Use canonical setting name, numeric types, and surface these values via config
- Prefer the canonical key
async_insert_busy_timeout_max_ms(alias ofasync_insert_busy_timeout_ms) to match docs and avoid confusion. (github.com, fossies.org)- Pass numeric values (ints) instead of strings for numeric settings; the driver accepts interface{} and this removes quoting ambiguity.
- Optional: log the enabled async-insert parameters once to aid runtime diagnostics.
- Optional: consider exposing these as cfg fields (with CLI flags) so they can be tuned per env without code changes.
Apply:
- settings["async_insert_busy_timeout_ms"] = "3000" // 3 seconds - settings["async_insert_max_data_size"] = "104857600" // 100MB - settings["async_insert_max_query_number"] = "100000" // 100k + // Canonical name; numeric values + settings["async_insert_busy_timeout_max_ms"] = 3000 // 3s + settings["async_insert_max_data_size"] = 100 << 20 // 100 MiB + settings["async_insert_max_query_number"] = 100_000 + zLog.Debug(). + Int("async_insert_busy_timeout_ms", 3000). + Int("async_insert_max_data_size", 100<<20). + Int("async_insert_max_query_number", 100_000). + Msg("ClickHouse async insert enabled")Note: Keeping
wait_for_async_insert = 1is aligned with ClickHouse guidance for production reliability. (clickhouse.com)cmd/root.go (1)
83-83: Behavior change: asyncInsert and compression default to true — plan rollout and safeguardsGreat defaults for throughput, but this flips behavior for users relying on previous defaults.
Actions:
- Announce in release notes and config docs; include opt-out examples (flags/env).
- Validate ClickHouse capacity (CPU for LZ4, memory for async buffers) before broad enablement.
- Consider adding flags for async-insert tunables (busy timeout/data size/query number) to complement the new defaults.
- Add a feature flag or env (e.g., TW_DISABLE_CLICKHOUSE_ASYNC_DEFAULTS) if you need a quick rollback path.
Checklist to run before merge:
- Staging soak with these defaults on realistic write rates; watch
system.asynchronous_insert_log, insert latencies, parts/sec, merges, and memory.- Confirm no unexpected changes to ingest error semantics due to
wait_for_async_insert=1.Also applies to: 89-89, 106-106, 112-112, 117-117, 123-123, 201-201
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
cmd/root.go(3 hunks)internal/storage/clickhouse.go(1 hunks)
| settings["async_insert_busy_timeout_ms"] = "3000" // 3 seconds | ||
| settings["async_insert_max_data_size"] = "104857600" // 100MB | ||
| settings["async_insert_max_query_number"] = "100000" // 100k | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Double-check interaction with Native batch inserts
This connector predominantly uses PrepareBatch (Native protocol). Please verify your ClickHouse version and driver behavior to ensure async_insert settings apply as intended to your insert path; if not, these settings won’t influence batch.Send() semantics. If needed, document the expected behavior or gate settings by mode.
You can confirm server-side support and effective values with:
-- Run on the target cluster as the same user this app uses
SELECT name, value, changed, origin
FROM system.settings
WHERE name IN (
'async_insert','wait_for_async_insert',
'async_insert_busy_timeout_max_ms','async_insert_max_data_size','async_insert_max_query_number'
);Also check that your ClickHouse version documents these settings:
🌐 Web query:
Verify whether ClickHouse async_insert settings (busy_timeout_max_ms, max_data_size, max_query_number) apply to Native protocol inserts, and whether busy_timeout_ms is aliased to busy_timeout_max_ms.
💡 Result:
Short answers:
-
The async_insert settings (async_insert_busy_timeout_max_ms, async_insert_max_data_size, async_insert_max_query_number) control ClickHouse's asynchronous-insert buffering/flush behavior — they apply to async INSERT processing, not to every insert mode. [1][2]
-
Native/binary-protocol inserts do not universally use the async_insert path: many ClickHouse clients/SDKs only support async_insert when sending SQL/text INSERTs (the native Insert API in some clients bypasses async buffering). See the clickhouse-cpp notes that async inserts work for SQL/text but the native Block Insert path does not use it. [2][4]
-
async_insert_busy_timeout_ms is an alias for async_insert_busy_timeout_max_ms. [3]
Sources:
- ClickHouse settings docs (async_insert, async_insert_max_data_size, async_insert_max_query_number, busy_timeout descriptions). [clickhouse.com docs].
- clickhouse-cpp GitHub README/issues explaining async_insert requires SQL/text inserts and native Insert() (binary Block) doesn’t use async insert. [github.com/ClickHouse/clickhouse-cpp].
- ClickHouse source / settings history showing async_insert_busy_timeout_ms aliased to async_insert_busy_timeout_max_ms. [fossies / ClickHouse source].
Document or gate async_insert settings for native protocol
Current async_insert settings (async_insert_busy_timeout_ms/async_insert_busy_timeout_max_ms, async_insert_max_data_size, async_insert_max_query_number) apply only to ClickHouse’s async INSERT (SQL/text) path and are ignored by the native binary protocol used by PrepareBatch/batch.Send(). Remove or conditionally apply these settings for the native batch insert path, or clearly document that they have no effect in native mode.
🤖 Prompt for AI Agents
In internal/storage/clickhouse.go around lines 130 to 133, the async_insert_*
settings are being applied unconditionally but only affect ClickHouse’s SQL/text
async INSERT path and are ignored by the native binary protocol used by
PrepareBatch/batch.Send(); either stop applying these settings when using the
native client or add a clear comment documenting they have no effect in native
mode. Change the logic to detect the client/protocol (native vs SQL/text) and
only set
async_insert_busy_timeout_ms/async_insert_max_data_size/async_insert_max_query_number
for the SQL/text path, otherwise omit them; if detection is not practical, add
an inline comment and documentation near this block stating these settings do
not apply to native batch inserts so maintainers won’t expect them to affect
PrepareBatch/batch.Send().
Summary by CodeRabbit